03. Biases in Data Collection

Lesson 3 03 Biases In Data Collection

Biases in Data Collection

Summary of the types of biases in Data Collection

Selection Bias

  1. Non response bias
  1. Voluntary bias
  • Random sampling can provide strong protection against voluntary response bias.
  1. Undercoverage

Response Bias

  1. Leading Questions
  2. Social Desirability

Missing Variables

  1. Features that are not included as a part of data collection that affect the analysis and final recommendation

Survivorship Bias

  1. Brands that exist in collection today but their churn indicate implications on an analyses and its interpretation

Additional Resources on Overcoming Selection and Response bias

  1. Simple random sampling is cited as a way to address biases during data collection. Check out this blog describing some method.
  2. If you are interested in academic papers, we also recommend reading an article by the title "Addressing Selection Bias in Event Studies with General Purpose Social Media Panels" by Princeton faculty Han Zhang and Microsoft Researchers Shawndra Hill and David Rothschild.

Quiz 1

QUIZ QUESTION::

Match the Selection Bias Example to Type Response options:

ANSWER CHOICES:



Selection Bias

Type

Asking a survey group their salaries as part of a feature dataset.

Polling a college group about their political preferences as a feature set to represent the larger population’s opinion.

Asking a group of older residents to take a 5 question survey via smartphone as part of a feature dataset.

Phoning entrepreneurs to ask about their financial growth and primarily getting responses from companies that are growing.

Running polls in urban areas as part of a feature dataset.

SOLUTION:

Selection Bias

Type

Polling a college group about their political preferences as a feature set to represent the larger population’s opinion.

Phoning entrepreneurs to ask about their financial growth and primarily getting responses from companies that are growing.

Running polls in urban areas as part of a feature dataset.

Asking a survey group their salaries as part of a feature dataset.

Asking a group of older residents to take a 5 question survey via smartphone as part of a feature dataset.

Phoning entrepreneurs to ask about their financial growth and primarily getting responses from companies that are growing.

Running polls in urban areas as part of a feature dataset.

Asking a survey group their salaries as part of a feature dataset.

Asking a group of older residents to take a 5 question survey via smartphone as part of a feature dataset.

Lesson 3 08 Q A Convienient Sample

Quiz 2

Which of the following statements are not true?

SOLUTION:
  • Random sampling is a good way to reduce __response bias__.
  • To guard against bias from undercoverage, use a __convenience sample__.
  • To guard against __nonresponse bias__, use a mail-in survey.